With the increasing ability of large language models (LLMs), in-context learning (ICL) has become a new paradigm for natural language processing (NLP), where LLMs make predictions only based on contexts augmented with a few training examples. It has been a new trend exploring ICL to evaluate and extrapolate the ability of LLMs. In this paper, we aim to survey and summarize the progress, challenges, and future work in ICL. We first present a formal definition of ICL and clarify its correlation to related studies. Then, we organize and discuss advanced techniques of ICL, including training strategies, prompting strategies, and so on. Finally, we present the challenges of ICL and provide potential directions for further research. We hope our work can encourage more research on uncovering how ICL works and improving ICL in future work.
translated by 谷歌翻译
Deep-learning-based technologies such as deepfakes ones have been attracting widespread attention in both society and academia, particularly ones used to synthesize forged face images. These automatic and professional-skill-free face manipulation technologies can be used to replace the face in an original image or video with any target object while maintaining the expression and demeanor. Since human faces are closely related to identity characteristics, maliciously disseminated identity manipulated videos could trigger a crisis of public trust in the media and could even have serious political, social, and legal implications. To effectively detect manipulated videos, we focus on the position offset in the face blending process, resulting from the forced affine transformation of the normalized forged face. We introduce a method for detecting manipulated videos that is based on the trajectory of the facial region displacement. Specifically, we develop a virtual-anchor-based method for extracting the facial trajectory, which can robustly represent displacement information. This information was used to construct a network for exposing multidimensional artifacts in the trajectory sequences of manipulated videos that is based on dual-stream spatial-temporal graph attention and a gated recurrent unit backbone. Testing of our method on various manipulation datasets demonstrated that its accuracy and generalization ability is competitive with that of the leading detection methods.
translated by 谷歌翻译
我们提出协调指导矢量字段,以与机器人团队同时完成两个任务:首先,多个机器人的指导和导航到可能嵌入2D或3D中的可能不同的路径或表面;其次,他们的运动协调在跟踪他们的规定路径或表面时。运动配位是由路径或表面上的机器人之间所需的参数位移定义的。通过控制对应于指导矢量场之间的路径或表面参数的虚拟坐标来实现这种所需的位移。由动力学系统理论和Lyapunov理论支撑的严格数学保证,用于从所有初始位置上有效的分布式运动协调和机器人在路径或表面上导航。作为实用机器人应用的一个例子,我们从所提出的具有驱动饱和度的Dubins-car样模型的指导向量场中得出了一种对照算法。我们提出的算法分布并可扩展到任意数量的机器人。此外,广泛的说明性模拟和固定翼飞机户外实验验证了我们算法的有效性和鲁棒性。
translated by 谷歌翻译
只有单个目标扬声器的语音供参考的单发语音转换(VC)已成为一个热门研究主题。现有作品通常会散布音色,而有关音高,节奏和内容的信息仍然混合在一起。为了进一步删除这些语音组件,有效地执行一声VC,我们采用随机重新采样用于音高和内容编码器,并使用互信息的各种对比对数比率上限和基于梯度反向层的对抗性相互信息学习来确保不同部分在训练过程中仅包含所需的分离表示的潜在空间。 VCTK数据集的实验显示该模型就自然性和智能性方面实现了一声VC的最新性能。此外,我们可以通过语音表示分离分别传递音色,音调和节奏的单发VC的特征。我们的代码,预训练的模型和演示可在https://im1eon.github.io/is2022-Srdvc/上获得。
translated by 谷歌翻译
稀疏奖励学习通常在加强学习(RL)方面效率低下。 Hindsight Experience重播(她)已显示出一种有效的解决方案,可以处理低样本效率,这是由于目标重新标记而导致的稀疏奖励效率。但是,她仍然有一个隐含的虚拟阳性稀疏奖励问题,这是由于实现目标而引起的,尤其是对于机器人操纵任务而言。为了解决这个问题,我们提出了一种新型的无模型连续RL算法,称为Relay-HER(RHER)。提出的方法首先分解并重新布置原始的长马任务,以增量复杂性为新的子任务。随后,多任务网络旨在以复杂性的上升顺序学习子任务。为了解决虚拟阳性的稀疏奖励问题,我们提出了一种随机混合的探索策略(RME),在该策略中,在复杂性较低的人的指导下,较高复杂性的子任务的实现目标很快就会改变。实验结果表明,在五个典型的机器人操纵任务中,与香草盖相比,RHER样品效率的显着提高,包括Push,Pickandplace,抽屉,插入物和InstaclePush。提出的RHER方法还应用于从头开始的物理机器人上的接触式推送任务,成功率仅使用250集达到10/10。
translated by 谷歌翻译
批量归一化(BN)广泛用于现代神经网络,已被证明代表与域相关知识,因此对于跨域任务(如无监督域适应(UDA))无效。现有的BN变体方法在归一化模块中相同信道中的源和目标域知识。然而,跨域跨域的相应通道的特征之间的错位通常导致子最佳的可转换性。在本文中,我们利用跨域关系并提出了一种新颖的归一化方法,互惠归一化(RN)。具体地,RN首先呈现互易补偿(RC)模块,用于基于跨域频道明智的相关性在两个域中获取每个信道的补偿。然后,RN开发互易聚合(RA)模块,以便以其跨域补偿组件自适应地聚合特征。作为BN的替代方案,RN更适合于UDA问题并且可以容易地集成到流行的域适应方法中。实验表明,所提出的RN优于现有的正常化对应物,通过大幅度,并有助于最先进的适应方法实现更好的结果。源代码可在https://github.com/openning07/reciprocal-normalization-for-da上找到。
translated by 谷歌翻译
随着生成模型的快速发展,基于AI的面部操纵技术,称为DeepFakes,已经变得越来越真实。这种脸部伪造的方法可以攻击任何目标,这对个人隐私和财产安全构成了新的威胁。此外,滥用合成视频在许多领域都显示出潜在的危险,例如身份骚扰,色情和新闻谣言。受到生理信号中的空间相干性和时间一致性在所生物的内容中被破坏的事实,我们试图找到可以区分真实视频和合成视频的不一致模式,从面部像素的变化是与生理信息高度相关的。我们的方法首先将多个高斯级别的eulerian视频放大倍数(EVM)应用于原始视频,以扩大面部血容量的变化引起的生理变化,然后将原始视频和放大的视频转换为多尺度欧拉宽度的空间 - 时间地图(MemstMap),其可以代表不同八度的时变的生理增强序列。然后,这些地图以列为单位重新装入帧修补程序,并发送到视觉变压器以学习帧级别的时空描述符。最后,我们整理了嵌入功能并输出判断视频是真实还是假的概率。我们在面部框架++和DeepFake检测数据集上验证了我们的方法。结果表明,我们的模型在伪造检测中实现了出色的性能,并在交叉数据域中显示出出色的泛化能力。
translated by 谷歌翻译
Supervised Deep-Learning (DL)-based reconstruction algorithms have shown state-of-the-art results for highly-undersampled dynamic Magnetic Resonance Imaging (MRI) reconstruction. However, the requirement of excessive high-quality ground-truth data hinders their applications due to the generalization problem. Recently, Implicit Neural Representation (INR) has appeared as a powerful DL-based tool for solving the inverse problem by characterizing the attributes of a signal as a continuous function of corresponding coordinates in an unsupervised manner. In this work, we proposed an INR-based method to improve dynamic MRI reconstruction from highly undersampled k-space data, which only takes spatiotemporal coordinates as inputs. Specifically, the proposed INR represents the dynamic MRI images as an implicit function and encodes them into neural networks. The weights of the network are learned from sparsely-acquired (k, t)-space data itself only, without external training datasets or prior images. Benefiting from the strong implicit continuity regularization of INR together with explicit regularization for low-rankness and sparsity, our proposed method outperforms the compared scan-specific methods at various acceleration factors. E.g., experiments on retrospective cardiac cine datasets show an improvement of 5.5 ~ 7.1 dB in PSNR for extremely high accelerations (up to 41.6-fold). The high-quality and inner continuity of the images provided by INR has great potential to further improve the spatiotemporal resolution of dynamic MRI, without the need of any training data.
translated by 谷歌翻译
A computational graph in a deep neural network (DNN) denotes a specific data flow diagram (DFD) composed of many tensors and operators. Existing toolkits for visualizing computational graphs are not applicable when the structure is highly complicated and large-scale (e.g., BERT [1]). To address this problem, we propose leveraging a suite of visual simplification techniques, including a cycle-removing method, a module-based edge-pruning algorithm, and an isomorphic subgraph stacking strategy. We design and implement an interactive visualization system that is suitable for computational graphs with up to 10 thousand elements. Experimental results and usage scenarios demonstrate that our tool reduces 60% elements on average and hence enhances the performance for recognizing and diagnosing DNN models. Our contributions are integrated into an open-source DNN visualization toolkit, namely, MindInsight [2].
translated by 谷歌翻译
Despite the surprising few-shot performance of in-context learning (ICL), it is still a common practice to randomly sample examples to serve as context. This paper advocates a new principle for ICL: self-adaptive in-context learning. The self-adaption mechanism is introduced to help each sample find an in-context example permutation (i.e., selection and ordering) that can derive the correct prediction, thus maximizing performance. To validate the effectiveness of self-adaptive ICL, we propose a general select-then-rank framework and instantiate it with new selection and ranking algorithms. Upon extensive evaluation on eight different NLP datasets, our self-adaptive ICL method achieves a 40% relative improvement over the common practice setting. Further analysis reveals the enormous potential of self-adaptive ICL that it might be able to close the gap between ICL and finetuning given more advanced algorithms. Our code is released to facilitate future research in this area: https://github.com/Shark-NLP/self-adaptive-ICL
translated by 谷歌翻译